Flight Trials Winter 2020 were conducted from 2/17/2020 - 3/10/2020. Soapberry bugs were flight tested twice (T1 vs. T2) for multiple hours in the flight mill and observed from 8 AM to (5-8 PM) each day. I used multicategorical, nominal modeling (multinom) to analyze how sex and host plant as well as changes in mass and egg-laying changed a soapberry bug’s flight response between trials.
Flight Case ~ Mass Diff or Mass Perc + Sex
Flight Case ~ Mass Diff or Mass Perc + Sex + Host Plant
Flight Case ~ Mass Diff or Mass Perc + Sex + Wing2Body
Females Only
Graphs I generated below illustrate how changes in mass (more specifically % mass) or egg-laying led to changes in flight response. To generate these graphs, I ran multicategorical regressions to model the probability of different flight delta cases due to sex, host plant, and changes in mass or egg-laying. Changes in speed and distance as a result of changes in mass or egg-laying were not analyzed here but rather in a separate script.
To perform these analyses, a new variable will be created called “flight_case” which calculates the flight response differences between T1 and T2. Here is a formula and key describing each flight case event and encoding:
Formula : flight case = response in \(T_2\) - response in \(T_1\)
| Event | Encoding |
|---|---|
| flew in both trials | 2 |
| flew in T2 only | 1 |
| flew in neither trials | 0 |
| flew in T1 only | -1 |
Since the outcomes (or response variables) were no longer binomial, I used multicategorical logit models (refer to Chapter 6 of An Introduction to Categorical Data Analysis, Second Edition by Alan Agresti). Below I’ve written out notes on performing these analyses.
What we are interested in is a multicategorical, nominal response variable. When the response variable is nominal, there is no natural order among the response variable categories (unordered categories). When the response variable is multicategorical, its multicategory models assume that the counts in the categories of \(Y\) have a multinomial distribution.
Let J = number of categories for \(Y\).
\(\pi_1,...\pi_J\) = the response probabilities where \(\sum_j \pi_j = 1\).
With \(n\) independent observations, the probability distribution for the number of outcomes of the J types is the multinomial. It specifies the probability for each possible way the \(n\) observations can fall in the J categories. Multicategory logit models simultaneously use all pairs of categories by specifying the odds of outcome in one category instead of another.
Logit models for nominal response variables pair each category with a baseline category. The choice of the baseline category is arbitrary. When the last category (J) is the baseline, the baseline-category logits are
Given that the response falls in category j or category J, this is the log odds that the response is j. For J = 3, for instance, the model uses \(\log(\pi_1/\pi_3)\) and \(\log(\pi_2/\pi_3)\). The baseline-category logit model with a predictor x is
The model has J - 1 equations, with separate parameters for each. The effects vary according to the category paired with the baseline. When J = 2, this model simplifies to a single equation for \(\log(\pi_1/\pi_2) = logit(\pi_1)\), resulting in ordinary logistic regression for binary responses.
So how do these pair of categories determine equations for all other pairs of categories? Here is an arbitrary pair of categories a and b that follows the general equation above,
\(\log (\frac{\pi_a}{\pi_b}) = \log ( \frac{\pi_a/pi_J}{\pi_b/pi_J}) = \log (\frac{\pi_a}{\pi_J}) - \log (\frac{\pi_b}{\pi_J})\)
\(= (\alpha_a + \beta_a x) - (\alpha_b + \beta_b x)\)
\(= (\alpha_a - \alpha_b) + (\beta_a - \beta_b) x\)
So, the equation for categories a and b has the form \(\alpha + \beta x\) with intercept parameter \(\alpha = (\alpha_a - \alpha_b)\) and with slope parameter \(\beta = (\beta_a - \beta_b)\).
Let’s apply it to our data now:
source_path = "~/Desktop/git_repositories/SBB-dispersal/avbernat_working_on/Rsrc/"
script_names = c("center_flight_data.R", # Re-centers data
"clean_flight_data.R", # Loads and cleans data
"unique_flight_data.R",
"get_warnings.R",
"compare_models.R",
"regression_output.R", # Cleans regression outputs; prints in color or B&W
"AICprobabilities.R")
for (script in script_names) {
path = paste0(source_path, script)
source(path)
}
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
output_col = FALSE # Change to TRUE if working in Base R or RStudio; FALSE if generating an HTML
data <- read_flight_data("data/all_flight_data-Winter2020.csv")
data_all <- data[[1]]
data_tested <- data[[2]]
d <- create_delta_data(data_tested)
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
colnames(d)[c(1:2,5,66:68,71:73)]
## [1] "ID" "sex" "host_plant" "egg_diff" "mass_diff"
## [6] "flew_diff" "mass_per" "flight_case" "egg_case"
Age also plays a role in this dataset, but age was unknown because the soapberry bugs were field collected.
Below I used the multinom function from the nnet package to estimate a multinomial logistic regression model. There are other functions in other R packages capable of multinomial regression. I chose the multinom function because it does not require the data to be reshaped (as the mlogit package does) and to mirror the example code found in Hilbe’s Logistic Regression Models.
First, I chose the baseline by specifying the level using relevel function. Then, I ran our model using multinom. The multinom package does not include p-value calculations for the regression coefficients, so I calculated P using Wald tests (i.e. z-tests).
Finally, I considered mass difference as my predictor variable where a (+) sign means the bug gained mass between trials and a (-) sign means the bug lost mass between trials.
| Event | Sign |
|---|---|
| gained mass from T1 to T2 | pos |
| no mass change between trails | 0 |
| lost mass from T1 to T2 | neg |
Baseline
df <- d %>%
filter(!is.na(mass_diff), !is.na(flight_case))
df <- df[with(df, order(mass_diff)),]
n_trials = nrow(df)
df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")
Null model
null <- multinom(flight_case ~ 1, data = df)
## # weights: 8 (3 variable)
## initial value 385.389832
## iter 10 value 319.269929
## final value 319.269680
## converged
Multinom model: flight_case ~ mass_diff
model <- multinom(flight_case ~ mass_diff, data = df)
## # weights: 12 (6 variable)
## initial value 385.389832
## iter 10 value 317.510452
## iter 20 value 310.451788
## iter 30 value 308.978871
## iter 40 value 308.544941
## iter 50 value 308.493264
## iter 60 value 308.481503
## iter 70 value 308.459043
## iter 80 value 308.445007
## iter 90 value 308.440565
## final value 308.439954
## converged
model_table = calculate_P(model)
##
## AIC: 628.8799
## (Intercept) Estimate DF Std. Err. z wald P > |z|
## -1 -0.770 69.211 6 16.478 4.200 17.641 0.000
## 1 -2.129 6.699 6 28.958 0.231 0.054 0.817
## 2 0.420 25.328 6 12.662 2.000 4.002 0.045
anova(null, model, test="Chisq") # adding mass_diff improves fit
## Likelihood ratio tests of Multinomial Models
##
## Response: flight_case
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 1 831 638.5394
## 2 mass_diff 828 616.8799 1 vs 2 3 21.65945 7.678753e-05
Which multinomial model equations are significant:
get_significant_models()
The multinomial (ML) prediction equations are:
prediction_equations(model_table)
## [1] "log(pi_-1 / pi_1) = 1.36 + 62.51 Mass Change Flew in T1, rather than T2"
## [2] "log(pi_2 / pi_-1) = 1.19 + -43.88 Mass Change Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 2.55 + 18.63 Mass Change Flew in both, rather than T2"
| N | Model | P |
|---|---|---|
| 278 | Flew in T1 only rather than T2 only = 1.36 + 62.51 Mass Change | not sig |
| 278 | Flew in both rather than T1 only = 1.19 - 43.88 Mass Change | sig |
| 278 | Flew in both rather than T2 only = 2.55 + 18.63 Mass Change | not sig |
Estimated Odds
For bugs of mass_diff x + 1/50 (0.02) g, the estimated odds that the bug flew twice rather than only in T1 equals
exp(-43.88/50)
## [1] 0.4157796
times the estimated odds at mass_diff x (g). As gain mass, then less likely to fly again.
exp(coef(model)/50) # this compares to not flying, the baseline | 1/0 not significant
## (Intercept) mass_diff
## -1 0.9847154 3.991693
## 1 0.9583206 1.143376
## 2 1.0084255 1.659584
Predicted probabilities
You can also use predicted probabilities to help you understand the model. The multicategory logit model has an alternative expression in terms of the response probabilities. This is
The denominator is the same for each probability, and the numerators for various j sum to the denominator where \(\sum_j \pi_j = 1\). The parameters (\(\alpha\) and\(\beta\)) equal zero for whichever category is the baseline in the logit expressions. So these would be the equations for the model above:
\(\hat{\pi_{-1}} = \frac{e^{-1.77 + 69.21 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)
\(\hat{\pi_1} = \frac{e^{-2.13 - 6.70 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)
\(\hat{\pi_2} = \frac{e^{0.42 + 25.33 x}}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)
\(\hat{\pi_0} = \frac{e^{0 + 0 x} = 1}{1 + e^{-1.77 + 69.21 x} + e^{-2.13 - 6.70 x} + e^{0.42 + 25.33 x}}, j=1,...,J\)
x = mass_diff, which can give you the estimated probabilities.You can calculate these predicted probabilities for each of our outcome levels using the fitted function. You can start by generating the predicted probabilities for the observations in our dataset and viewing the first few rows. Then, you can plot them.
head(pp <- fitted(model))
## 0 -1 1 2
## 1 0.6268782 0.01288591 0.05518211 0.3050538
## 2 0.5919733 0.01843237 0.05424679 0.3353476
## 3 0.5859452 0.01955212 0.05405531 0.3404474
## 4 0.5612480 0.02470149 0.05318314 0.3608674
## 5 0.5485567 0.02772717 0.05268168 0.3710345
## 6 0.5356441 0.03109397 0.05213548 0.3811264
Multinom model: flight case ~ mass_per
Re-order
df <- df[with(df, order(mass_per)),]
model <- multinom(flight_case ~ mass_per, data = df)
## # weights: 12 (6 variable)
## initial value 385.389832
## iter 10 value 307.053645
## final value 306.984778
## converged
model_table = calculate_P(model)
##
## AIC: 625.9696
## (Intercept) Estimate DF Std. Err. z wald P > |z|
## -1 -0.909 0.044 6 0.010 4.492 20.179 0.000
## 1 -2.132 0.001 6 0.020 0.043 0.002 0.965
## 2 0.344 0.021 6 0.008 2.494 6.221 0.013
Model comparing flying only in T2 to the baseline (not flying at all) is not significant (P = 0.97). That is probably because so few bugs (only male) flew in T2 only.
Significant ML equations
run_multinom_model = function(d) {
m <- multinom(flight_case ~ mass_per, trace=FALSE, data = d)
return(m)
}
get_significant_models()
The multinomial (ML) prediction equations are:
gsub("Mass Change", "Mass Percent Change", prediction_equations(model_table))
## [1] "log(pi_-1 / pi_1) = 1.22 + 0.04 Mass Percent Change Flew in T1, rather than T2"
## [2] "log(pi_2 / pi_-1) = 1.25 + -0.02 Mass Percent Change Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 2.48 + 0.02 Mass Percent Change Flew in both, rather than T2"
| N | Model | P |
|---|---|---|
| 278 | Flew in T1 only rather than T2 only = 1.22 + 0.04 Mass % | sig |
| 278 | Flew in both rather than T1 only = 1.25 - 0.02 Mass % | sig |
| 278 | Flew in both rather than T2 only = 2.48 + 0.02 Mass % | not sig |
Estimated Odds
For bugs of mass_perc x + 20%*(0.04) the estimated odds that bug flew in T1 rather than T2 equals
exp(0.04*20)
## [1] 2.225541
times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug is 2.23 times more likely to fly only in T1. That makes sense because the odds of flying in T2 only are very low to begin with.
For bugs of mass_perc x + 40%*(-0.02), the estimated odds that the bug flew twice instead of only in T1 equals
exp(40*-0.02) # 2.5 times less likely to fly twice if gain mass
## [1] 0.449329
exp(-40*-0.02) # 2.5 times more likely to fly twice if loose mass
## [1] 2.225541
times the estimated odds at mass_perc x (%). Similar message - for every 40% increase in a bug’s original mass, the bug was 2.23 times more likely to fly only in T1 rather than fly twice. Although, when a bug looses 40% of original mass then becomes more likely to fly twice then only once.
exp(coef(model)*20) # this compares to no fly, the baseline # gain large % mass
## (Intercept) mass_per
## -1 1.264064e-08 2.415238
## 1 3.047172e-19 1.017407
## 2 9.660792e+02 1.508336
exp(coef(model)*5) # gain small % mass
## (Intercept) mass_per
## -1 1.060333e-02 1.246637
## 1 2.349493e-05 1.004324
## 2 5.575107e+00 1.108216
exp(coef(model)*-5) # loose small % mass
## (Intercept) mass_per
## -1 9.431000e+01 0.8021582
## 1 4.256237e+04 0.9956950
## 2 1.793688e-01 0.9023509
exp(coef(model)*-20) # loose large % mass
## (Intercept) mass_per
## -1 7.910994e+07 0.4140378
## 1 3.281732e+18 0.9828910
## 2 1.035112e-03 0.6629821
# 1/0 not significant
1 vs. 0 is not significant as already stated. However, -1 vs. 0 shows that as gain more mass then become increasingly more likely to only fly once. If loose mass then most likely to not fly at all, and more likely to fly twice then fly only once.
head(pp <- fitted(model))
## 0 -1 1 2
## 1 0.5386483 0.04284419 0.06190317 0.3566043
## 2 0.5356362 0.04375052 0.06158899 0.3590243
## 3 0.5290568 0.04577594 0.06090110 0.3642662
## 4 0.5263205 0.04663701 0.06061436 0.3664282
## 5 0.4982068 0.05614792 0.05764729 0.3879980
## 6 0.4949167 0.05734314 0.05729764 0.3904425
For more on multinomial plotting for when have more than 1 predictor variables see the following: https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
First: flight_case, mass_diff, and sex
df <- df[with(df, order(mass_diff)),]
data <- data.frame(R = df$flight_case,
A = df$mass_diff,
B = df$sex_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 2 FF.R")
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## [,1] [,2]
## AICs 589.7091 593.7062
## models 3 4
## probs 0.8804523 0.1193253
##
## m3 multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m4 multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m3, m4, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 825 571.7091
## 2 A * B 822 569.7062 1 vs 2 3 2.002834 0.5718186
delta_mass_model <- multinom(flight_case ~ mass_diff + sex_c, data = df)
## # weights: 16 (9 variable)
## initial value 385.389832
## iter 10 value 297.481645
## iter 20 value 287.067261
## iter 30 value 286.119187
## iter 40 value 285.989229
## iter 50 value 285.976242
## iter 60 value 285.937258
## iter 70 value 285.917216
## iter 80 value 285.877940
## iter 80 value 285.877937
## iter 90 value 285.874179
## iter 100 value 285.854534
## final value 285.854534
## stopped after 100 iterations
model_table = calculate_P2(delta_mass_model, "mass_diff", "sex_c")
##
## AIC: 589.7091
## (Intercept) mass_diff sex_c DF SE1 SE2 z1 z2 ML wald
## -1 -0.965 70.907 -0.788 9 17.662 0.219 4.015 -3.590 16.117
## 1 -17.461 -27.383 -16.279 9 67.108 0.184 -0.408 -88.332 0.167
## 2 0.205 20.417 -0.885 9 12.907 0.157 1.582 -5.626 2.502
## sex wald P1 > |z| P2 > |z|
## -1 12.886 0.000 0
## 1 7802.495 0.683 0
## 2 31.647 0.114 0
Significant models
get_significant_models(11) # mass_diff
get_significant_models(12) # sex
prediction_equations2(model_table, " Mass Change", " Sex ")
## [1] "Where F = 1"
## [1] "log(pi_-1 / pi_1) = 16.5 + 98.29 Mass Change + 15.49 Sex Flew in T1, rather than T2"
## [2] "log(pi_2 / pi_-1) = 1.17 + -50.49 Mass Change + -0.1 Sex Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 17.67 + 47.8 Mass Change + 15.39 Sex Flew in both, rather than T2"
| N | Model | P |
|---|---|---|
| 278 | Flew in T1 only rather than T2 only = 16.5 + 98.29 Mass Change + 15.49 Sex | not mass | sex** |
| 278 | Flew in both rather than T1 only = 1.25 - 50.49 Mass Change - 0.1 Sex | mass** | not sex |
| 278 | Flew in both rather than T2 only = 17.67 + 47.8 Mass Change + 15.39 Sex | not mass | sex** |
Second: flight_case, mass_per, and sex
df <- df[with(df, order(mass_per)),]
data <- data.frame(R = df$flight_case,
A = df$mass_per,
B = df$sex_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 2 FF.R")
## [,1] [,2]
## AICs 587.5607 593.4209
## models 3 4
## probs 0.9492355 0.05068255
##
## m3 multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m4 multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m3, m4, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 825 569.5607
## 2 A * B 822 569.4209 1 vs 2 3 0.1398496 0.9866598
delta_mass_model <- multinom(flight_case ~ mass_per + sex_c, data = df)
## # weights: 16 (9 variable)
## initial value 385.389832
## iter 10 value 286.869825
## iter 20 value 284.809036
## iter 30 value 284.797822
## final value 284.780360
## converged
model_table = calculate_P2(delta_mass_model, "mass_per", "sex_c")
##
## AIC: 587.5607
## (Intercept) mass_per sex_c DF SE1 SE2 z1 z2 ML wald sex wald
## -1 -1.015 0.043 -0.692 9 0.010 0.203 4.390 -3.408 19.272 11.617
## 1 -6.820 -0.009 -5.626 9 0.026 0.183 -0.348 -30.721 0.121 943.764
## 2 0.124 0.019 -0.902 9 0.008 0.159 2.334 -5.684 5.447 32.310
## P1 > |z| P2 > |z|
## -1 0.000 0.001
## 1 0.728 0.000
## 2 0.020 0.000
prediction_equations2(model_table, " Mass Percent Change", " Sex ")
## [1] "Where F = 1"
## [1] "log(pi_-1 / pi_1) = 5.81 + 0.05 Mass Percent Change + 4.93 Sex Flew in T1, rather than T2"
## [2] "log(pi_2 / pi_-1) = 1.14 + -0.02 Mass Percent Change + -0.21 Sex Flew in both, rather than T1"
## [3] "log(pi_2 / pi_1) = 6.94 + 0.03 Mass Percent Change + 4.72 Sex Flew in both, rather than T2"
Significant models
run_multinom_model = function(d) {
m <- multinom(flight_case ~ mass_per + sex_c, trace=FALSE, data = d)
model_table = calculate_P2(m, "mass_diff", "sex_c", print_table=FALSE)
return(model_table)
}
get_significant_models(11)
get_significant_models(12)
| N | Model | P |
|---|---|---|
| 278 | Flew in T1 only rather than T2 only = 5.81 + 0.05 Mass % + 4.93 Sex | mass** | sex** |
| 278 | Flew in both rather than T1 only = 1.14 - 0.02 Mass % - 0.21 Sex | mass** | not sex |
| 278 | Flew in both rather than T2 only = 6.94 + 0.03 Mass % + 4.72 Sex | not mass | sex** |
Estimated Odds
For males:
For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in T1 rather than T2 equals
exp(0.05*20)
## [1] 2.718282
times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug was 2.72 times more likely to fly only in T1. That makes sense because the odds of flying in T2 only are very low to begin with.
For bugs of mass_perc x + 20*(-0.02%), the estimated odds that the bug flew twice instead of only in T1 equals
exp(20*-0.02) # 20 times less likely to fly twice if gain mass
## [1] 0.67032
exp(-20*-0.02) # 20 times more likely to fly twice if loose mass
## [1] 1.491825
times the estimated odds at mass_perc x (%). Similar message - for every 20% increase in a bug’s original mass, the bug was 1.5 times more likely to fly only in T1. So, when a bug starts loosing mass it becomes more likely to fly twice then only once.
For females:
For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in T1 rather than T2 equals
exp(0.05*20 + 4.93)
## [1] 376.1545
times the estimated odds at mass_perc x (%). So for every 20% increase in a bug’s original mass, the bug was 376 times more likely to fly only in T1. Crazy! It doesn’t take much for a female to only fly once.
For bugs of mass_perc x + 20%*(0.05), the estimated odds that bug flew in both trials rather than T1 equals
exp(-0.02*20 - 0.21)
## [1] 0.5433509
exp(-0.02*-40 - 0.21)
## [1] 1.803988
times the estimated odds at mass_perc x (%). So half as likely to fly in both trials if gain mass. Would need to loose a lot of mass (40) to be almost twice as likely (1.8) to fly in both trials.
exp(coef(delta_mass_model)*20) # this compares to no fly, the baseline # gain large % mass
## (Intercept) mass_per sex_c
## -1 1.513930e-09 2.3406655 9.798741e-07
## 1 5.730838e-60 0.8325593 1.367423e-49
## 2 1.185738e+01 1.4736071 1.475858e-08
exp(coef(delta_mass_model)*5) # gain small % mass
## (Intercept) mass_per sex_c
## -1 6.237728e-03 1.2369007 3.146245e-02
## 1 1.547229e-15 0.9552209 6.081010e-13
## 2 1.855655e+00 1.1017814 1.102202e-02
exp(coef(delta_mass_model)*-5) # loose small % mass
## (Intercept) mass_per sex_c
## -1 1.603148e+02 0.8084723 3.178392e+01
## 1 6.463168e+14 1.0468783 1.644464e+12
## 2 5.388934e-01 0.9076211 9.072748e+01
exp(coef(delta_mass_model)*-20) # loose large % mass
## (Intercept) mass_per sex_c
## -1 6.605326e+08 0.4272289 1.020539e+06
## 1 1.744946e+59 1.2011156 7.313027e+48
## 2 8.433568e-02 0.6786069 6.775721e+07
# 1/0 mass not sig but sex**
Predicted Probabilities
head(pp <- fitted(delta_mass_model))
## 0 -1 1 2
## 1 0.7917303 0.03003973 4.362037e-06 0.1782256
## 2 0.7894639 0.03073036 4.325625e-06 0.1798015
## 3 0.7844677 0.03228066 4.247094e-06 0.1832474
## 4 0.7823713 0.03294258 4.214827e-06 0.1846819
## 5 0.7601763 0.04036342 3.895616e-06 0.1994564
## 6 0.7574981 0.04130977 3.859619e-06 0.2011882
df$index = 1:nrow(df)
females = df %>%
filter(sex=="F")
males = df %>%
filter(sex=="M")
Frows = females$index
Mrows = males$index
No females flew in T2 only.
df <- df[with(df, order(mass_diff)),]
data <- data.frame(R = df$flight_case,
A = df$mass_diff,
B = df$sex_c,
C = df$host_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R")
## [,1] [,2] [,3] [,4] [,5]
## AICs 589.7091 591.9755 593.0435 593.7062 593.9087
## models 4 12 13 8 7
## probs 0.5142205 0.1655767 0.09707198 0.06969088 0.0629815
##
## m4 multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m12 multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m13 multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m8 multinom(formula = R ~ A * B, data = data, trace = FALSE)
## m7 multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # Adding C (host plant) does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 825 571.7091
## 2 A + B + C 822 569.9087 1 vs 2 3 1.800378 0.6148527
Host plant is not significant.
df <- df[with(df, order(mass_per)),]
data <- data.frame(R = df$flight_case,
A = df$mass_per,
B = df$sex_c,
C = df$wing2body)
model_script = paste0(source_path,"generic multinomial models- multinom 1RF + 3 FF.R")
model_comparisonsAIC(model_script)
## [,1] [,2] [,3]
## AICs 582.3093 585.1022 587.1333
## models 7 12 13
## probs 0.6639449 0.1643044 0.05951306
##
## m7 multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m12 multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m13 multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
anova(m7, m12, test="Chisq") # adding A*C does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 822 558.3093
## 2 A * C + B 819 555.1022 1 vs 2 3 3.207044 0.3607914
anova(m7, m13, test="Chisq") # Adding B*C does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 822 558.3093
## 2 B * C + A 819 557.1333 1 vs 2 3 1.175993 0.7587677
calculate_P3 = function(m, print_table=TRUE) {
# compute p-values using Wald tests (i.e. z-tests)
s <- summary(m)
z <- s$coefficients/s$standard.errors
wald <- z^2
p <- (1 - pnorm(abs(z), 0, 1)) * 2
# summary table (similar to lm, glm, or lmer function)
model_table <- cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[,2:4], z[,2:4], wald[,2:4], p[,2:4])
colnames(model_table) <- c("(Intercept)", "mass %", "sex","wing2body","DF",
"SE1", "SE2", "SE3", "z1", "z2", "z3",
"wald2","wald1", "wald3", "P1>|z|", "P2>|z|", "P3>|z|")
if (print_table){
cat("\n", "AIC: ", s$AIC, "\n")
print(round(model_table,3))
}
return(model_table)
}
model <- multinom(flight_case ~ mass_per + sex_c + wing2body, data = df)
## # weights: 20 (12 variable)
## initial value 385.389832
## iter 10 value 291.096706
## iter 20 value 280.669331
## iter 30 value 279.898964
## iter 40 value 279.532350
## iter 50 value 279.155239
## final value 279.154632
## converged
model_table = calculate_P3(model)
##
## AIC: 582.3093
## (Intercept) mass % sex wing2body DF SE1 SE2 SE3 z1 z2
## -1 -17.862 0.041 -0.571 23.558 12 0.010 0.212 12.064 4.256 -2.697
## 1 -4.395 -0.005 -9.580 -8.937 12 0.025 6.669 18.679 -0.191 -1.436
## 2 -19.931 0.018 -0.760 28.019 12 0.008 0.166 9.729 2.145 -4.587
## z3 wald2 wald1 wald3 P1>|z| P2>|z| P3>|z|
## -1 1.953 18.110 7.273 3.813 0.000 0.007 0.051
## 1 -0.478 0.036 2.063 0.229 0.849 0.151 0.632
## 2 2.880 4.600 21.039 8.293 0.032 0.000 0.004
Signal of flying in only T2 is weak, which is probably why the equation in the model was not significant. It does not differ much from not flying at all.
Significant equations
run_multinom_model = function(d) {
m <- multinom(flight_case ~ mass_per + sex_c + wing2body, trace=FALSE, data = d)
model_table = calculate_P3(m, print_table=FALSE)
return(model_table)
}
get_significant_models(15) # mass%
get_significant_models(16) # sex
get_significant_models(17) # wing2body
Predicted Probabilities
head(pp <- fitted(model))
## 0 -1 1 2
## 1 0.7474703 0.03578853 1.245735e-09 0.2167412
## 2 0.8118293 0.02849196 1.532139e-09 0.1596788
## 3 0.6989358 0.04308981 1.080435e-09 0.2579744
## 4 0.8015113 0.03105505 1.486385e-09 0.1674336
## 5 0.7665014 0.04013877 1.345112e-09 0.1933599
## 6 0.7891792 0.03740734 1.448068e-09 0.1734135
Amazing to see what a clear impact wing2body ratio has for whether bugs are more likely to fly twice or not fly at all. Those with wing2body ratios above the mean wing2body ratio appear to be more likely to fly twice and less likely to not fly. This is true across females and males.
Baseline
df <- d %>%
filter(!is.na(egg_diff), !is.na(mass_diff), !is.na(flew_diff), sex_c == 1)
df <- df[with(df, order(mass_diff)),]
n_tfemales = nrow(df)
df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")
Barplot
Egg Case
| Event | Encoding |
|---|---|
| laid eggs in both trials | 2 |
| laid eggs in T2 only | 1 |
| laid eggs in neither trials | 0 |
| laid eggs in T1 only | -1 |
Baseline
df <- d %>%
filter(!is.na(egg_case), !is.na(mass_diff), !is.na(flew_diff), sex_c == 1)
df <- df[with(df, order(mass_diff)),]
n_tfemales = nrow(df)
df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")
Null model
df$flight_case = droplevels(df$flight_case) # no female bug only flew in T2
null <- multinom(flight_case ~ 1, data = df)
## # weights: 6 (2 variable)
## initial value 102.170943
## final value 93.055466
## converged
Multinom model: flew_diff ~ egg_case
model <- multinom(flight_case ~ egg_case, data = df)
## # weights: 9 (4 variable)
## initial value 102.170943
## final value 84.114906
## converged
model_table = calculate_P(model)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
##
## AIC: 176.2298
## (Intercept) Estimate DF Std. Err. z wald P > |z|
## -1 -0.194 -0.656 4 0.345 -1.902 3.616 0.057
## 2 0.697 -1.189 4 0.314 -3.782 14.306 0.000
anova(null, model, test="Chisq") # adding egg_case improves fit
## Likelihood ratio tests of Multinomial Models
##
## Response: flight_case
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 1 184 186.1109
## 2 egg_case 182 168.2298 1 vs 2 2 17.88112 0.0001309677
Significant models
get_significant_modelsf(7) #-1/0 egg case not sig | 0/-1 or 2/-1 egg case not sig | -1/2 egg case not sig | but 0/2 and 2/0 egg case**
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
Comparing models: adding host
Adding host with egg_diff and mass_diff
| Host | Encoding |
|---|---|
| Golden Rain Tree (GRT) | 1 |
| Balloon Vine (BV) | -1 |
data <- data.frame(R = df$flight_case,
A = df$egg_case,
B = df$mass_diff,
C = df$host_c)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R")
## [,1] [,2] [,3] [,4] [,5]
## AICs 165.1826 166.6029 167.0497 167.9211 167.9513
## models 7 13 4 11 16
## probs 0.3572256 0.1756085 0.1404487 0.09084268 0.08948043
##
## m7 multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m13 multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m4 multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m11 multinom(formula = R ~ A * B + C, data = data, trace = FALSE)
## m16 multinom(formula = R ~ B * C + A * B, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # Adding C does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 180 155.0497
## 2 A + B + C 178 149.1826 1 vs 2 2 5.86705 0.05320915
anova(m7, m13, test="Chisq") # Adding mass_diff*host does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 178 149.1826
## 2 B * C + A 176 146.6029 1 vs 2 2 2.579779 0.2753012
model <- multinom(flight_case ~ mass_diff + egg_case, data = df)
## # weights: 12 (6 variable)
## initial value 102.170943
## iter 10 value 81.763896
## iter 20 value 77.915648
## iter 30 value 77.575673
## iter 40 value 77.528423
## iter 50 value 77.524883
## final value 77.524846
## converged
model_table = calculate_P2(model, "mass_diff", "egg_case")
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
##
## AIC: 167.0497
## (Intercept) mass_diff egg_case DF SE1 SE2 z1 z2 ML wald sex wald
## -1 -0.879 57.434 -0.534 6 17.992 0.384 3.192 -1.391 10.190 1.935
## 2 0.527 18.699 -1.089 6 14.416 0.296 1.297 -3.674 1.682 13.500
## P1 > |z| P2 > |z|
## -1 0.001 0.164
## 2 0.195 0.000
Significant eqs
run_multinom_model = function(d) {
m <- multinom(flight_case ~ mass_diff + egg_case, trace=FALSE, data = d)
model_table = calculate_P2(m, "mass_diff", "egg_case", print_table=FALSE)
return(model_table)
}
get_significant_modelsf(11) # mass diff
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
get_significant_modelsf(12) # egg_case
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
## Warning in cbind(s$coefficients, c(s$edf, s$edf, s$edf), s$standard.errors[, :
## number of rows of result is not a multiple of vector length (arg 2)
tb = model_table
# -1 / 2 | Flew in T1, not T2
I = (tb[1,1] - tb[2,1])
M = (tb[1,2] - tb[2,2])
E = (tb[1,3] - tb[2,3])
EQ1 = paste0("log(pi_-1 / pi_2) = ", round(I, 2), " + ", round(M,2), " Mass Diff", " + ", round(E, 2), " Egg Case", " Flew in T1, rather than both" )
EQ1 # mass_dff** | sex not
## [1] "log(pi_-1 / pi_2) = -1.41 + 38.73 Mass Diff + 0.56 Egg Case Flew in T1, rather than both"
Predicted Probabilities
head(pp <- fitted(model))
## 0 -1 2
## 1 0.9145007 0.009854788 0.07564450
## 2 0.2856071 0.021532897 0.69286000
## 3 0.7647877 0.021005983 0.21420635
## 4 0.7482798 0.025860598 0.22585957
## 5 0.8863560 0.020152630 0.09349132
## 6 0.8810560 0.022470525 0.09647351
plot6(df5,pp5)
wing2body
Baseline
df <- d %>%
filter(!is.na(egg_case), !is.na(mass_diff), !is.na(flew_diff), !is.na(wing2body), sex_c == 1)
df <- df[with(df, order(mass_diff)),]
n_tfemales = nrow(df)
df$flight_case <- relevel(as.factor(df$flight_case), ref = "0")
data <- data.frame(R = df$flight_case,
A = df$egg_case,
B = df$mass_diff,
C = df$wing2body)
source("src/compare_models.R")
model_comparisonsAIC("src/generic multinomial models- multinom 1RF + 3 FF.R") # strange error...
## [,1] [,2] [,3] [,4] [,5] [,6]
## AICs 165.9791 167.0497 168.793 169.6042 169.9652 169.9733
## models 7 4 12 11 13 8
## probs 0.4170701 0.2441983 0.1021376 0.0680834 0.05683876 0.05661042
##
## m7 multinom(formula = R ~ A + B + C, data = data, trace = FALSE)
## m4 multinom(formula = R ~ A + B, data = data, trace = FALSE)
## m12 multinom(formula = R ~ A * C + B, data = data, trace = FALSE)
## m11 multinom(formula = R ~ A * B + C, data = data, trace = FALSE)
## m13 multinom(formula = R ~ B * C + A, data = data, trace = FALSE)
## m8 multinom(formula = R ~ A * B, data = data, trace = FALSE)
anova(m4, m7, test="Chisq") # adding wing2body does not include fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 180 155.0497
## 2 A + B + C 178 149.9791 1 vs 2 2 5.070548 0.07924002
anova(m7, m12, test="Chisq") # Adding A*C does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 178 149.9791
## 2 A * C + B 176 148.7930 1 vs 2 2 1.186134 0.5526299
anova(m7, m11, test="Chisq")
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 178 149.9791
## 2 A * B + C 176 149.6042 1 vs 2 2 0.3749583 0.8290464
anova(m7, m13, test="Chisq")
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B + C 178 149.9791
## 2 B * C + A 176 149.9652 1 vs 2 2 0.01392852 0.9930599
anova(m4, m8, test="Chisq") # Adding A*B does not improve fit
## Likelihood ratio tests of Multinomial Models
##
## Response: R
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 A + B 180 155.0497
## 2 A * B 178 153.9733 1 vs 2 2 1.076425 0.5837908
wing2body is not in the top model for females
| Event | Encoding |
|---|---|
| flew in both trials | 2 |
| flew in T2 only | 1 |
| flew in neither trials | 0 |
| flew in T1 only | -1 |
| Event | Sign |
|---|---|
| gained mass from T1 to T2 | pos |
| no mass change between trails | 0 |
| lost mass from T1 to T2 | neg |
| Host | Encoding |
|---|---|
| Golden Rain Tree (GRT) | 1 |
| Balloon Vine (BV) | -1 |
Females Only
| Event | Encoding |
|---|---|
| flew in both trials | 2 |
| flew in neither trial | 0 |
| flew in T1 only | -1 |
| Event | Encoding |
|---|---|
| laid eggs in both trials | 2 |
| laid eggs in T2 only | 1 |
| laid eggs in neither trials | 0 |
| laid eggs in T1 only | -1 |
flight case ~ mass diff
flight case ~ mass per
flight case ~ mass per and sex
wing2body
females only
flight case ~ mass diff and egg case